Implementation of DAWG
نویسنده
چکیده
Let T be a text over a xed alphabet A. Then an automaton can be created in a linear time that accepts all substrings that occur in text T . The ratio of the size of the implementation of this automaton (factor automaton, DAWG) and of the input text is in usual cases 14:1 . This paper shows a method of implementing DAWG that reduces this ratio down to 4:1 while preserving good qualities of the automaton, which is linear time of its construction with respect to the length of the input text and linear time of checking that a pattern is present in the text with respect to the length of the pattern.
منابع مشابه
Implementation of directed acyclic word graph
An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text T is a minimal automaton that accepts all substrings of a text T, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement d...
متن کاملDictionary Representation Using Eecient Dawg Implementation
The huge amount of information stored on a dictionary has increased the need for text compression. The amount of compression that can be obtained using current techniques is usually a tradeoo between speed and the amount of memory required. There is a considerable potential for savings to be made by the use of compression. Although hash tables are widely used, a trie structure is more appropria...
متن کاملDNA assembly with gaps (Dawg): simulating sequence evolution
MOTIVATION Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simu...
متن کاملDirect Construction of Compact Directed Acyclic Word Graphs
The Directed Acyclic Word Graph (DAWG) is an e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used by D...
متن کاملDeveloping a Policy Framework for Digital Preservation
The Arts and Humanities Data Service (AHDS) has been established by the Joint Information Systems Committee of the UK's Higher Education Funding Councils to collect, preserve and promote re-use of digital resources which result from or support research and teaching in the arts and humanities. Within the UK, the Digital Archiving Working Group (DAWG) has been formed to co-ordinate research into ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998